Lexical Category Acquisition as an Incremental Process
نویسنده
چکیده
Psycholinguistic studies suggest that early on children acquire robust knowledge of the abstract lexical categories such as nouns, verbs and determiners (e.g., Gelman & Taylor, 1984; Kemp et al., 2005). Children’s grouping of words into categories might be based on various cues, including the phonological and morphological properties of a word, the distributional information about its surrounding context, and its semantic features. Among these, the distributional properties of the local context of a word have been shown to be a reliable cue for the formation of the lexical categories (Redington et al., 1998; Mintz, 2003). Several computational models have used distributional information for categorizing words (e.g. Brown et al., 1992; Schütze, 1993; Redington et al., 1998; Clark, 2000; Mintz, 2002). The majority of these models use iterative, unsupervised methods that partition the vocabulary into a set of optimum clusters (e.g., Brown et al., 1992; Clark, 2000). The generated clusters are intuitive, and can be used in different tasks such as word prediction and parsing. Moreover, these models confirm the learnability of abstract word categories, and hint at distributional cues as a useful source of information for this purpose. The process of learning word categories by children is necessarily incremental. Human language acquisition is bounded by memory and processing limitations, and it is implausible that humans process large volumes of text at once and induce an optimum set of categories. Efficient online computational models must be developed to investigate whether the distributional information is equally powerful in an online process of word categorization. There have only been a few previous attempts at applying an incremental method to category acquisition. The model of Cartwright & Brent (1997) uses an algorithm which incrementally merges word clusters so that a Minimum Description Length criterion for a template grammar is optimized. The model treats whole sentences as contextual units, which sacrifices a degree of incrementality, as well as making it less robust to noise in the input. The model proposed by Parisien et al. (2008) uses a Bayesian clustering algorithm that can cope with ambiguity, and shows the developmental trends observed in children (e.g. the order of acquisition of different categories). However, their fully Bayesian implementation is computationally expensive. Moreover, when measuring the similarity between two contexts, the model is sensitive to mismatches between any pair of context features, which results in the creation of sparse clusters. To overcome the problem, they introduce a bootstrapping mechanism which improves the performance, but adds substantially to the computational load. We propose an efficient incremental model for clustering words into categories based on their local context. Each word of a sentence is processed and categorized individually based on the similarity of its content (the word itself) and its context (the surrounding words) to the existing clusters. We test our model on a corpus of child-directed speech from CHILDES (MacWhinney, 2000). Over time, the model learns a finegrained set of word categories that are intuitive and can be used in a variety of tasks. We evaluate our model on a word prediction task, where a missing word is guessed based on its context. We also use our model to infer the semantic properties of a novel word based on the context it appears in. In both tasks, we show that our induced categories outperform the part of speech tags used for annotating the corpus.
منابع مشابه
The production of lexical categories (VP) and functional categories (copula) at the initial stage of child L2 acquisition
This is a longitudinal case study of two Farsi-speaking children learning English: ‘Bernard’ and ‘Melissa’, who were 7;4 and 8;4 at the start of data collection. The research deals with the initial state and further development in the child second language (L2) acquisition of syntax regarding the presence or absence of copula as a functional category, as well as the role and degree of L1 influe...
متن کاملThe Effect of Interaction on Lexical Acquisition
This research showed that appropriate input and suitable contexts for interaction among students can lead to successful second language acquisition (SLA). This study based on Swain's (2005) notion of collaborative dialogue, aimed to study whether EFL learners participating in negotiation of meaning based tasks collaborate with each other and, if so, to investigate the role of this behavior in ...
متن کاملOnline Entropy-Based Model of Lexical Category Acquisition
Children learn a robust representation of lexical categories at a young age. We propose an incremental model of this process which efficiently groups words into lexical categories based on their local context using an information-theoretic criterion. We train our model on a corpus of childdirected speech from CHILDES and show that the model learns a fine-grained set of intuitive word categories...
متن کاملA Text Understander That
We introduce an approach to the automatic acquisition of new concepts from natural language texts which is tightly integrated with the underlying text understanding process. The learning model is centered around thèquality' of diierent forms of linguistic and conceptual evidence which underlies the incremental generation and reenement of alternative concept hypotheses, each one capturing a diie...
متن کاملTies That Bind: Reconciling Discrepancies Between Categorization and Naming
We present the results of a study designed to show that dissociations between lexical and similaritybased boundary partitions for a set of items can be produced in the laboratory. This is achieved by an incremental process of learning to assign a category label to items increasingly far removed (in similarity space) from the center of that category and approaching a different category. This pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009